为了使腿部机器人与人类和动物的运动能力相匹配,它们不仅必须产生强大的周期性步行和跑步,而且还必须在名义运动步态和更专业的瞬态操纵之间无缝切换。尽管最近在两足机器人的控制方面取得了进步,但几乎没有集中精力产生高度动态的行为。利用强化学习制定控制腿机器人的政策的最新工作表明,在产生强大的步行行为方面取得了成功。但是,这些学识渊博的政策难以在单个网络上表达多种不同行为。受腿部机器人的常规优化控制技术的启发,这项工作应用了一个经常性的策略来执行四步,90度转弯,使用从优化的单个刚体模型轨迹生成的参考数据进行了训练。我们提出了一个新型的培训框架,该培训框架使用结尾终端奖励从预先计算的轨迹数据中学习特定行为,并证明了双皮亚机器人Cassie上的硬件成功转移。
translated by 谷歌翻译
在这项工作中,我们提出了一种方法,用于生成降低的模型参考轨迹,用于用于双皮亚机器人的高度动态操作的一般类别,用于SIM卡之间,用于SIM卡至现实的增强学习。我们的方法是利用单个刚体模型(SRBM)来优化轨迹的库库,以用作学习政策的奖励函数中的专家参考。该方法将模型的动态旋转和翻译行为转化为全阶机器人模型,并成功将其传输到真实硬件。 SRBM的简单性允许快速迭代和行为改进,而基于学习的控制器的鲁棒性则可以将高度动态的动作传输到硬件。 %在这项工作中,我们介绍了一套可转移性约束,将SRBM动态修改为实际的两足机器人硬件,这是我们为动态步进,转动操作和跳跃创建最佳轨迹的框架。在这项工作中,我们介绍了一套可转移性约束,将SRBM动力学修改为实际的双皮亚机器人硬件,我们为各种高度动态的操作创建最佳轨迹的框架,以及我们整合参考轨迹的高速强化跑步轨迹的方法学习政策。我们验证了在两足机器人Cassie上的方法,我们成功地展示了高达3.0 m/s的高度动态接地步态。
translated by 谷歌翻译
在本文中,我们研究了在中间姿势期间应用踝扭矩是否可以是降低运动量的更有效的方法,而不是单独执行腿部长度。脚踝在人类Gaits中有用,因为许多原因包括静态平衡。在这项工作中,我们专门避免了脚后跟和托对福利,以研究中姿势期间的脚跟到脚趾的压力中心的进展是有益的。我们使用“踝关节驱动弹簧加载的倒立摆”模型来模拟压力动力学的变速中心,并且应用轨迹优化来查找最小化运输成本的极限循环。结果表明,对于绝大多数Gaits,脚踝扭矩不会影响运输成本。脚踝在从接地跑到空中跑步的过渡期间减少了在狭窄的Gaits窄带期间的运输成本。这表明在稳定步态的中间姿势期间施加脚踝扭矩不是直接有益的策略,但最有可能是有益的脚跟和脚趾之间的道路。
translated by 谷歌翻译
本文介绍了与平面腿机器人一起使用的支持和恢复系统的设计和控制。系统以三种模式运行。首先,它可以以完全透明的模式操作,其中没有力被应用于机器人。在这种模式下,系统遵循机器人,以便在需要时能够快速捕获机器人。其次,它可以提供垂直支撑力以在操作期间提供机器人。第三,它可以抓住机器人并在未能避免跌倒和相关损害后将其从地面拉开。在此模式下,系统在允许运行多个连续试验的试验后自动重置机器人,无需手动干预。通过致动电缆和滑轮系统将支撑力应用于机器人,该电缆和滑轮系统使用串联弹簧串联弹性致动,以实现真正透明的操作。该系统的非线性性质需要仔细设计控制器,以确保可预测的安全行为。在本文中,我们介绍了恢复系统的机电调整设计,开发合适的控制器,并评估了BipeDal Robot Ramone上的系统性能。
translated by 谷歌翻译
Existing automated techniques for software documentation typically attempt to reason between two main sources of information: code and natural language. However, this reasoning process is often complicated by the lexical gap between more abstract natural language and more structured programming languages. One potential bridge for this gap is the Graphical User Interface (GUI), as GUIs inherently encode salient information about underlying program functionality into rich, pixel-based data representations. This paper offers one of the first comprehensive empirical investigations into the connection between GUIs and functional, natural language descriptions of software. First, we collect, analyze, and open source a large dataset of functional GUI descriptions consisting of 45,998 descriptions for 10,204 screenshots from popular Android applications. The descriptions were obtained from human labelers and underwent several quality control mechanisms. To gain insight into the representational potential of GUIs, we investigate the ability of four Neural Image Captioning models to predict natural language descriptions of varying granularity when provided a screenshot as input. We evaluate these models quantitatively, using common machine translation metrics, and qualitatively through a large-scale user study. Finally, we offer learned lessons and a discussion of the potential shown by multimodal models to enhance future techniques for automated software documentation.
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $\epsilon_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. We give an $(\epsilon_{\text{dp}}, \delta)$-differentially private algorithm which, given $n$ samples of Lipschitz loss functions, obtains near-optimal optimization error and makes $\min(n, n^2\epsilon_{\text{dp}}^2 d^{-1}) + \min(n^{4/3}\epsilon_{\text{dp}}^{1/3}, (nd)^{2/3}\epsilon_{\text{dp}}^{-1})$ queries to the gradients of these functions. In the regime $d \le n \epsilon_{\text{dp}}^{2}$, where privacy comes at no cost in terms of the optimal loss up to constants, our algorithm uses $n + (nd)^{2/3}\epsilon_{\text{dp}}^{-1}$ queries and improves recent advancements of [KLL21, AFKT21]. In the moderately low-dimensional setting $d \le \sqrt n \epsilon_{\text{dp}}^{3/2}$, our query complexity is near-linear.
translated by 谷歌翻译
Learning efficient and interpretable policies has been a challenging task in reinforcement learning (RL), particularly in the visual RL setting with complex scenes. While neural networks have achieved competitive performance, the resulting policies are often over-parameterized black boxes that are difficult to interpret and deploy efficiently. More recent symbolic RL frameworks have shown that high-level domain-specific programming logic can be designed to handle both policy learning and symbolic planning. However, these approaches rely on coded primitives with little feature learning, and when applied to high-dimensional visual scenes, they can suffer from scalability issues and perform poorly when images have complex object interactions. To address these challenges, we propose \textit{Differentiable Symbolic Expression Search} (DiffSES), a novel symbolic learning approach that discovers discrete symbolic policies using partially differentiable optimization. By using object-level abstractions instead of raw pixel-level inputs, DiffSES is able to leverage the simplicity and scalability advantages of symbolic expressions, while also incorporating the strengths of neural networks for feature learning and optimization. Our experiments demonstrate that DiffSES is able to generate symbolic policies that are simpler and more and scalable than state-of-the-art symbolic RL methods, with a reduced amount of symbolic prior knowledge.
translated by 谷歌翻译
Recent years have seen a proliferation of research on adversarial machine learning. Numerous papers demonstrate powerful algorithmic attacks against a wide variety of machine learning (ML) models, and numerous other papers propose defenses that can withstand most attacks. However, abundant real-world evidence suggests that actual attackers use simple tactics to subvert ML-driven systems, and as a result security practitioners have not prioritized adversarial ML defenses. Motivated by the apparent gap between researchers and practitioners, this position paper aims to bridge the two domains. We first present three real-world case studies from which we can glean practical insights unknown or neglected in research. Next we analyze all adversarial ML papers recently published in top security conferences, highlighting positive trends and blind spots. Finally, we state positions on precise and cost-driven threat modeling, collaboration between industry and academia, and reproducible research. We believe that our positions, if adopted, will increase the real-world impact of future endeavours in adversarial ML, bringing both researchers and practitioners closer to their shared goal of improving the security of ML systems.
translated by 谷歌翻译